12 - Artificial Intelligence II [ID:57512]

50 von 752 angezeigt

Yeah.

Welcome back.

The there we go.

We've been kind of wrapping up decision theories, decision theory for making decisions by taking

time into account in partially observable environments and the technique is called POMDPs.

Remember the kind of idea was we're using Markov processes to model the world and we've

talked extensively about that and now we kind of do the decision theory for that and that

gives us these MDPs, Markov decision problems, which is just basically maximizing expected

utility of actions.

And the interesting thing here is that we can no longer in partially observable Markov

decision procedures, we cannot rely anymore on policies.

Policies don't help us because we don't know where we are.

Where a policy is something that takes each state and tells you what to do optimally.

If we don't know where we are, that's useless.

So we need to do something else and that's what we're slowly but surely developing.

We were kind of in the middle.

Why aren't you doing anything?

Because something is...

There we go.

And to go partially observable, we've updated our example with a new feature, namely that

we have a noisy sensor.

Instead of having a fully observable environment where we just know where we are, where we

can observe the state, we have a sensor that gives us or counts the adjacent walls, which

alone is only partial observability.

We can only observe the walls, so we might not be able to...

We cannot distinguish between this state and that state.

They have the same number of adjacent walls.

One has a wall left and right, the other one has left and top, or north and west.

And to make things even worse, we're going to add noise.

Okay?

So that is typically what we see is a sensor model and we want to have an observation model

or sensor model, whatever you want to call it, but we want to assume the sensor Markov

property and that we have that being stationary, so the sensor doesn't change over time in

its characteristics, but we still don't know where we are anymore.

Okay?

So the question is, how do we deal with this?

And we've been relying on the theorem by Elstrom that we have that the optimal policy in a

POMDP is a function not on the states, right?

A policy is something that tells you this is a state, do that, but here it's a function

that is on the belief state.

So we're lifting everything up one level.

Instead of having single states, we have belief states, which are probability distribution

over states.

So the idea that we're going to pursue and we have pursued is to basically convert a

POMDP into an MDP in belief space, and the idea there is that we can use MDP methods

because the belief state is fully observable.

Why?

Because, pardon me?

So the point is here that you can always, you know what you believe, right?

Teil einer Videoserie :

Artificial Intelligence II

Presenters

Prof. Dr. Michael Kohlhase

Zugänglich über

Offener Zugang

Dauer

01:23:59 Min

Aufnahmedatum

2025-06-03

Hochgeladen am

2025-06-04 12:59:11

Sprache

en-US

Einbetten

Wordpress FAU Plugin

 https://www.fau.tv/clip/id/57512

iFrame

<iframe src="https://api.video.uni-erlangen.de/services/oembed/?url=https://www.fau.tv/clip/id/57512&format=iframe&maxwidth=1280&maxheight=720" width="1280" height="720"seamless allowfullscreen style="border: 0; padding: 0; margin: 0;overflow: hidden;"></iframe>

Herunterladen

Video

Audio

Per RSS abonnieren